Spark 3.3: Improve task and job abort handling#6876
Spark 3.3: Improve task and job abort handling#6876aokolnychyi merged 1 commit intoapache:masterfrom
Conversation
| Map<String, String> props = table.properties(); | ||
| Tasks.foreach(files(messages)) | ||
| .executeWith(ThreadPools.getWorkerPool()) | ||
| .retry(PropertyUtil.propertyAsInt(props, COMMIT_NUM_RETRIES, COMMIT_NUM_RETRIES_DEFAULT)) |
There was a problem hiding this comment.
I don't think it is reasonable to use commit retry mechanism for deletes. It is the only place we did this. For now, I added some default configs in SparkCleanupUtil. I doubt we want to make it configurable.
| : ImmutableList.of())); | ||
| } | ||
| return ImmutableList.of(); | ||
| private List<DataFile> files(WriterCommitMessage[] messages) { |
There was a problem hiding this comment.
I need a list to know the collection size.
There was a problem hiding this comment.
I have a little concern about memory, now we are manifesting paths into List, instead of keeping them as Iterable (if they are originally). I see its mostly to log sizes, I wonder if we can't implement a wrapping counter iterable for that?
There was a problem hiding this comment.
I'm okay either way, it seems like we were previously anyways materializing the WriterCommitMessages which have the files anyways? using s3 as an example, it takes 1 million objects with the worst case key length of 1024 bytes to use 1 GB of memory.
There was a problem hiding this comment.
That's true. actually @amogh-jahagirdar was wondering if you know, is there a reason we dont have the deleteFiles() return number of deleted files? Would be probably be more convenient for callers to log the size that way?
There was a problem hiding this comment.
I changed the code to keep a list of files (shouldn't cost anything extra as those files are already there) and switched to using Lists.transform(), which is a lazy transform in SparkCleanupUtil.
There was a problem hiding this comment.
@szehon-ho @amogh-jahagirdar, could you take another look?
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCleanupUtil.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| // the format matches what Spark uses for internal logging | ||
| private static String taskInfo() { |
There was a problem hiding this comment.
Nit: what do you think to move private method to bottom? Breaks the flow of code a bit (would have liked to see deleteFiles right after deleteTaskFiles as its the main delegate)
There was a problem hiding this comment.
I was trying to group methods by logic instead of access. My reasoning here was that taskInfo() is only invoked in this method is directly related to deleteTaskFiles(). Let me know if that makes sense.
There was a problem hiding this comment.
Yea , its definitely subjective, I prefer personally to see the public methods and their javadocs first to get a high level idea of what the class before diving in to details (especially given there's only two public methods in this class). But as its style preference, I'll leave it optional then.
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java
Outdated
Show resolved
Hide resolved
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCleanupUtil.java
Outdated
Show resolved
Hide resolved
| : ImmutableList.of())); | ||
| } | ||
| return ImmutableList.of(); | ||
| private List<DataFile> files(WriterCommitMessage[] messages) { |
There was a problem hiding this comment.
I have a little concern about memory, now we are manifesting paths into List, instead of keeping them as Iterable (if they are originally). I see its mostly to log sizes, I wonder if we can't implement a wrapping counter iterable for that?
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Thanks @aokolnychyi great to see this improvement
| try { | ||
| io.deleteFiles(paths); | ||
| LOG.info("Deleted {} file(s) using bulk deletes ({})", paths.size(), context); | ||
|
|
There was a problem hiding this comment.
Nit: unnecessary newline
There was a problem hiding this comment.
We do this sometimes when either the try or catch block are non-trivial to separate them.
| : ImmutableList.of())); | ||
| } | ||
| return ImmutableList.of(); | ||
| private List<DataFile> files(WriterCommitMessage[] messages) { |
There was a problem hiding this comment.
I'm okay either way, it seems like we were previously anyways materializing the WriterCommitMessages which have the files anyways? using s3 as an example, it takes 1 million objects with the worst case key length of 1024 bytes to use 1 GB of memory.
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java
Outdated
Show resolved
Hide resolved
| if (cleanupOnAbort) { | ||
| SparkCleanupUtil.deletePaths("job abort", table.io(), filePaths(messages)); | ||
| } else { | ||
| LOG.warn("Skipping cleanup of written files, unable to determine the final commit state"); |
There was a problem hiding this comment.
The "skipping cleanup of written files" part makes sense to me, but wouldn't "unable to determine the final commit state" apply for both cases (any abort case)? Or are we trying to indicate that we won't be cleaning up any orphan files
There was a problem hiding this comment.
I adapted the original comment but I agree it is a bit weird as the var name is generic and does not say anything about commit state. I changed.
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCleanupUtil.java
Outdated
Show resolved
Hide resolved
| return "unknown task"; | ||
| } else { | ||
| return String.format( | ||
| "partition %d (task %d, attempt %d, stage %d.%d)", |
There was a problem hiding this comment.
can we show stage attempID better something like :
Task (id : <TaskID>, attempt : <attemptNumber>), Stage (id : <stageId>, attemp : <attempNumber>)
in place of
(task 0, attempt 0, stage 0.0)
There was a problem hiding this comment.
My idea is to follow the exact format used in Spark so that we can easily match Spark and Iceberg logs.
[Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] ERROR org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask - Aborting commit for partition 0 (task 0, attempt 0, stage 0.0)
[Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] INFO org.apache.iceberg.spark.source.SparkCleanupUtil - Deleted 2 file(s) (partition 0 (task 0, attempt 0, stage 0.0))
In this example, it is clear that these two records belong to the same context, even though they were produced by Spark and Iceberg. If we change the format, it won't be obvious.
...v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestWriteAborts.java
Outdated
Show resolved
Hide resolved
f669b92 to
b73396c
Compare
|
@szehon-ho @amogh-jahagirdar @singhpk234, could you take another look? |
szehon-ho
left a comment
There was a problem hiding this comment.
Looks good to me , small comments for consideration
| } | ||
|
|
||
| // the format matches what Spark uses for internal logging | ||
| private static String taskInfo() { |
There was a problem hiding this comment.
Yea , its definitely subjective, I prefer personally to see the public methods and their javadocs first to get a high level idea of what the class before diving in to details (especially given there's only two public methods in this class). But as its style preference, I'll leave it optional then.
| /** | ||
| * Attempts to delete as many given files as possible. | ||
| * | ||
| * @param context a helpful description of the context in which this method is invoked |
There was a problem hiding this comment.
Nit: Thanks for javadoc, how about 'a helpful description of the operation invoking this method' (to avoid re-using context to define itself)? Not sure its completely accurate though.
There was a problem hiding this comment.
I like it, let me change.
b73396c to
5678a5a
Compare
|
Thanks for reviewing, @szehon-ho @singhpk234 @amogh-jahagirdar! |
This PR improves our task and job abort handling in Spark 3.3.